The Common Information of N Dependent Random Variables 



Wei Liu Ge Xu Biao Chen 

Department of EECS Department of EECS Department of EECS 

Syracuse University Syracuse University Syracuse University 

Email: wliu28@syr.edu Email: gexu@syr.edu Email: bichen@syr.edu 



o 

> 

O 

00 



c/5 



> 
m 

m 
d 

o 



X 



Abstract — This paper generalizes Wyner's definition of com- 
mon information of a pair of random variables to that of N 
random variables. We prove coding theorems that show the 
same operational meanings for the common information of two 
random variables generalize to that of N random variables. As 
a byproduct of our proof, we show that the Gray-Wyner source 
coding network can be generalized to N source squences with 
N decoders. We also establish a monotone property of Wyner's 
common information which is in contrast to other notions of the 
common information, specifically Shannon's mutual information 
and Gacs and Korner's common randomness. Examples about 
the computation of Wyner's common information of N random 
variables are also given. 

I. Introduction 

Consider a pair of dependent random variables X and Y 
with joint distribution P(x,y). Characterizing the common 
information between X and Y has been a topic of research 
interest in the past decades JI]-||5]. There have been three 
classical notions reported in the literature. 
Shannon's [6] mutual information I(X;Y) 

Shannon's mutual information measures how much uncer- 
tainty can be reduced with respect to one random variable 
by observation the other random variable. In the case that 
X and Y are independent, mutual information I(X; Y) = 0, 
indicating that observing one variable X does not give any 
information about Y and vice versa. Shannon's mutual infor- 
mation carries operational meanings that are instrumental in 
laying the foundation for information theory. 
Gacs and Korner's |1| common randomness K(X,Y) 

Consider a pair of independent and identically distributed 
random sequences X n , Y n with each pair (Xi, Yi) ~ P(x, y). 
These two sequences are observed respectively by two nodes, 
which attempt to map the sequences onto a common message 
set W. Specifically, let /„ and g n be such mappings, i.e., 



fn 



y n 



■ w, 
w. 



Define e„ = Pr{W 1 ± W 2 ) where Wi = f n {X n ) and W 2 = 
g n (Y n ). Gacs and Korner's common randomness is defined 
as 



K(X,Y) 



lim 

>oo,e n 



sup-H(Wi). 
>o n 



Gacs and Korner's common randomness has found extensive 
applications in cryptography, i.e., for key generation 0-119). 
On the other hand, the common randomness notion is rather 
restrictive as it equals in most cases except for the following 
special case (or random variable pairs that can be converted 



to such distributions through relabeling of realizations, i.e., 
permutation of joint distribution matrix). Let X and Y be 
X = (X 1 , V) and Y = (Y', V), respectively, where X', Y', V 
are independent. Clearly, the common part between X and Y 
is V and it follows that K(X; Y) = H(V). Note that for this 
example I(X; Y) = K(X; Y) = H(V). 
Wyner's |4j common information C(X, Y) 
Wyner's common information is defined as 



C(X,Y) 



min I(XY:W). 



(1) 



Thus the hidden (or auxiliary) variable W induces a Markov 
chain X — W —Y, or, equivalently, a conditional independence 
structure of X, Y being independent given W. Wyner gave 
two operational meanings for the above definition. The first 
approach is shown in Fig. [T] The encoder observes a pair 
of sequences (X n ,Y n ), and map them to three messages 
WofWifWz, taking values in alphabets of respective sizes 
2 nR ^2 nR \2 nR2 . Decoder 1, upon receiving (W , Wi), needs 
to reproduce X n reliably while decoder 2, upon receiving 
(Wo,^), needs to reproduce Y n reliably. Let C\ be the 
infimum of all admissible Rq for the system in Fig. 1 such 
that the total rate R + Ri + R 2 « H(X. Y). 

The second approach is shown in Fig. 2. A common input 
W, uniformly distributed on W = {1, • • • i 2 nRa } is given 
to two separate processors which are otherwise independent 
of each other. These processors (random variable generators) 
generating independent and identically distributed sequences 
according to and q 2 (Y\W) respectively. The output 

sequences of the two processors are denoted by X n and Y n 
respectively. Thus the joint distribution of the output sequences 
is, 

~ (2) 



Q(X n ,Y n ) = J2 ^r ; qi(X n \W)q 2 (Y n \W)- 



wEW 



w 



Define C 2 of (X, Y) to be infimum of rate Rq for the common 
input such that q(X n ,Y n ) close to p(X n ,Y n ), where the 
closeness is defined using the average divergence of the two 
distributions 



J2 P(x n ,y n )log 



D n (P,Q) = - 
n 

Wyner proved that 

Cx=C 2 = C(X,Y). 
It was observed in H that 

K(X,Y) < I(X;Y) < C{X,Y). 



Q{x n ,y n )' 



(3) 
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Fig. 1. Source coding over a simple network. 
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Fig. 2. Random variable generators. 

Wyner [4] and Witsenhausen [5] also provide several examples 
on how to calculate the common information C(X, Y). For the 
example of X = (X 1 , V) and Y =(Y', V) with (X 1 , Y', V) 
mutually independent, C{X,Y) = I{X:Y) = K(X,Y) = 
H(V). 

Generalizing of mutual information to N random variables 
was first reported in [ 1 1 . The generalization comes from the 
observation that for a pair of random variables, Shannon's 
information measures is consistent with the Venn diagram for 
set operation and a comprehensive treatment was available 
in ifm . fl2l . Gacs and Korner's common randomness was 
recently generalized to multiple random variables by Tyagi, 
Narayan and Gupta in lfl3l . which extends the encoding 
process in the definition of common randomness to that of 
N terminals. 

In this paper, we generalize Wyner's common informa- 
tion of a pair of random variables to that of N dependent 
variables. We show that the operational meaning defined in 
both approaches are still valid. Moreover, we establish some 
monotone property of such generalization which contrast to 
the notion of 'common' information. Specifically, we show 
that the common information does not decrease as the number 
of variables increases while keeping the same marginal distri- 
bution. This is different from the other two notions of common 
information. Examples on evaluating C(X±, X 2 , ■ ■ ■ , Xn) are 
given for circularly symmetric binary sources and the asymp- 
totic results are also studied. 

The rest of this paper is organized as follows. Section II 
gives the problem formulation and main results. Section III 
gives some examples and discussions. Section IV concludes 
the paper. 

II. Problem Statement and Main Results 

Let Xi , X 2 , ■ ■ ■ , Xn be random variables that take values 
on the finite alphabet sets X\,X 2 ,--- ,Xn with joint dis- 
tribution P(x\,X2, ■ ■ ■ ,%n)- Our generalization of Wyner's 



common information is to define a similar measure for N 
random variables by preserving the conditional independence 
structure through the introduction of an auxiliary random 
variable. Specifically, we define 

C(X 1 ,X 2 ,--- ,X N )^MI(X 1 ,X 2 ,--- ,X N ;W), (6) 

where the infimum is taken over all the joint distributions of 

(Xi,X 2 , ■ ■ ■ ,X N ,W) such that 

}^P(x 1 ,x 2 , ■■■ ,x n ,w) = P(xi,x 2 , ■■■ ,xn), (7) 

w 

n 

P(xi, ...,x n \w) = Y[P(xi\w). (8) 

1=1 

Thus the marginal distribution of (Xi,X 2 ,--- ,Xn) is 
P(xi, x 2 , • ■ • , xn) and (X\, ■ ■ ■ , Xn) are conditionally in- 
dependent given W. 

We now give formal definitions of C\ and C 2 for N 
random variables. Consider N length-?i independent and iden- 
tically distributed source sequences (xi,x 2 ,--- ,2^) with 
(Xu, X 2i , ■ ■■ , X N i) ~ p(xi,x 2 , ■■ ■ , x N ), i.e., 

n 

P (n) W,^,.- ,x N ) = l[P{x li ,x 2i ,--- ,x m ). (9) 

i=l 

For the Gray-Wyner source coding network, we start with 
the definition of encoder-decoders. 

Definition 1: A (n, Mo, -Mi, • • ■ ,-Mn) code consists of 
the following: 

• An encoder mapping 

/ : X? x X% x • • • x X% -+ Mo x Mi x • • • x M N , 

where Mi ={1,2,--- ,2 nfl >}. 

• N decoders gi, for i = 1, 2, • • • , N, 

9i : Mi xM ^ XP. (10) 
The probability of error is defined as 

Pi") - Pr{(X?X2 ■■■X N )^ {X?,X?, ■ ■ ■ ,X N )}, (11) 

where X? = gi{M u M Q ) for % = 1, • • • , N. 

Definition 2: A number Ro is said to be achievable if for 
any e > 0, we can find an n sufficiently large such that there 
exists a (n, Mo, Mi, - ■ ■ ,Mn) code with 

Mo < 2 nRo (12) 
P e (n) < e, (13) 

1 N 

-VlogM < H(X 1 ,X 2 ,--- ,X N ) + e. (14) 
n ^— ' 

i=0 

As with the case for two random variables, C\ is defined as 
the infimum of all achievable Ro- 

For the second approach of approximating joint distribution, 
we again start with the following definition. 

Definition 3: An (n,M, A) generator consists of the fol- 
lowing: 

. a message set W G {1, 2, • • ■ , 2 nR }; 



for all w € W and conditional probability distributions 
q^ n \xf\w), for i = 1, 2, • • • , N, define the probability 
distribution on A?" x X 2 x • • • x Xfi 



E M 



1 N 



(15) 

Thus the N processors serve as random number generators 
each generating independent and identically distributed (i.i.d.) 
sequence X" according to q(xi\w) and the output of the 
processors follow joint distribution defined in Let 



A = D n (P^;Q^)- 



1 



E 



P (n) log ■ 



p(n) 



x?€-yr,t=i,2.— ,iv 



(16) 

where PW and are defined as in (O and ([T5T l respec- 
tively. 

Definition 4: A number R is said to be achievable if for 
all e > 0, we can find an n sufficiently large such that there 
exists a (n,M, A) generator with M < 2 nR and A < e. 
We define C2 as the infimum of all achievable R. 

The main result of this paper is the following theorm. 

Theorem 1: 



Ci — C 2 — C(X±,X 2 



,X N ). 



(17) 



The proof of Theorem 1 is given in the Appendix. Thus both 
C\ and C 2 admit single letter characterization which coincides 
with C{X U --- ,X N ). 

III. Examples and discussions 

We start with the following example. Let X = (X , U, V), 
Y = (Y',V,W) and Z = (z' ,W,U) where the random 
variables X ,Y , Z ,U,V,W are mutually independent. It is 
easy to show that for this example 



I(X;Y;Z) = K{X,Y,Z) = 0, 



whereas 



C{X,Y,Z) = H(UVW). 
On the other hand, 

C(X,Y) = H(V), 
C(X,Z) = H{U), 
C{Y,Z) = H(W). 

What is interesting is that the inclusion of an additional 
variable increases the common information. This is somewhat 
surprising: if the information is common it ought to be non- 
increasing when more random variables are included. Indeed, 
we can prove the following general result: 

Lemma 1: Let (X\, ■ ■ ■ ,Xjf) ~ p(x\, ■ ■ ■ ,Xn). For any 
two sets A, B that satisfy A C B C M = {1, 2, • • • ,N}, 

C(X A ) < C(X B ), (18) 

where Xa = {X l: i G A} and X B = {X i: i e B}. 



Proof: Let W' be the W that achieves C(X B ), i.e., 
I{W; X B ) = inf I(W; X B ). But AC B, thus X B condition- 
ally independent given W' implies that X^ is conditionally 
independent given W'. Thus 

I(X B ;W) > I(X A ;W) 

> miI(X A ;W) 

where the infimum is taken over all W such that X^ is 
independent given W . 

This monotone property perhaps suggests that the name 
common information, while meaningful for pair of variables, 
no longer suits the generalization to N variables. We comment 
here that Gacs and Korner's common randomness follows a 
different monotone property 

K(X A ) > K(X B ) 

while there is no definitive inequality relationship for mutual 
information. 

As a consequence, we have for any N random variables 

C(X\, X2, • • • , X N ) > K(Xi,X 2 , • • • , X N ). 

We now examine another example in which Wyner's com- 
mon information increases as the number of the observations 
increases. Moreover the common information eventually con- 
verges and the asymptote suggests that the notion of common 
information may have potential application in certain inference 
problem. 

Consider first the example of three binary random variables 
Xt, X2, X3 with joint distribution 



P(xi,x 2 ,x 3 ) 



i - |a if xi = x 2 = x 3 ( 
jdo otherwise 



where the parameter do satisfies < ao < |. 
It can be easily verified that 

Pr{Xi = Q} = \, 
for i = 1, 2, 3 and that for 1 < i,j <3,i^ j, 



(20) 



Pr(Xi = Xi,Xj = Xj) = i(l - a )S XitXj + -a (l - 6 X( , Xj ), 

(21) 

where S a .b = 1 if a = b and otherwise. 

Thus, each pair of (Xi,Xj), i ^ j, can be viewed as a 
doubly symmetric binary source as defined in J4) . We refer to 
this set of exchangeable binary sources circularly symmetric 
binary source. For such circularly symmetric binary source 
(Xi, X 2 , X3) with joint distribution given in (TT9b and random 
variables (Xi 1 X 2 ,X 3l W) that satisfy © and (HJ), we have 
the following lemma. 

Lemma 2: 

H{X 1 \W)+H{X 2 \W)+H{X i \W) <3%i), (22) 



where a\ 



|(l-2a )i 



w 




Fig. 3. A simple Bayesian graph model. 



This lemma is a direct consequence of Wyner's result on 
doubly symmetric binary source 0). Therefore, we have, 

= H(X X X 2 X 3 ) - H{X l X 2 X 3 \W), 

3 

= H(X 1 X 2 X 3 )-J2 H ( x i\W), 
> H{X 1 X 2 X z )-Zh{a l ), 

= 1 + h(a ) + a a + (1 - a )h ( — — — -)-2/i( ai ), 

\2{l-a )J 

(23) 

This lower bound can indeed be achieved by choosing the 
following random variables. Let If be a random variable 
with pw(0) — = 1/2, i.e., a Bernoulli (1/2) random 

variable. Let each Xi be the output of a binary symmetric 
channel (BSC) with crossover probability a\ with W as input. 
The channels share the common input W but are otherwise 
independent of each other. This is illustrated in the simple 
Bayesian graph model in Fig. [3] with N = 3 where each link 
represents a BSC with crossover probability oi. 

Thus, the common information of this circularly symmetric 
binary source is, 

C(X 1 ,X 2 ,X 3 ) = l + a Q + h{a Q ) + 

{1 - ao)h {2(r^)~ 3h{ai) > 

(24) 

Notice that any pair of (Xi,Xj) is a doubly symmetric 
binary source [4|, therefore, 

C(X,Y) = l + h(a ) - 2/i(ai). 

It is straightforward to check that 

C(X, Y, Z) > C(X, Y) 

when < ao < \. This is also shown numerically in Fig. |4] 
We now study the generalization of above example to arbi- 
trary N and in particular the asymptotic value of the common 
information for the circularly symmetric binary sources. 

Consider N binary random variables X\, X 2) • • • , Xn with 
joint distribution p(X\, X 2 , ■ ■ • , Xn) generated by an under- 
lying Bayesian graph model as in Fig. [3] where W is a 



Comparison of common information C(X,Y) and C(X,Y,Z) 
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Fig. 4. Comparison of common information. 

Bernoulli (1/2) random variable and each Xi, i = 1, 2, • • • ,N, 
is the output of a BSC with crossover probability ax (0 < a x < 
i) with a common input W. Hence, for xi,x 2 ,--- ,xjy G 
{0,1}, 

1 N 

P(xi,X2, ■ ■ ■ ,x„) = ^2 o II-^^I w ^> ^ 

we{o,i} i=i 

where for each i = 1, 2, • • • , N, pi(xi\w) = (1 — ai) if Xi = w 
and a\ otherwise. 
Similarly, we have, 

N 

Y,H(X t \W)<Nh{ ai ), (26) 

i=i 

for any random variable W that satisfies (0 and (0. 

Therefore, C(X\, X 2 , • • • , Xn) can be lower bounded by 

C(Xx,X 2 ,--- ,X N )>H(X 1 ,X 2 ,--- ,X N )-Nh( ai ). 

(27) 

On the other hand, the above lower bound is achievable by 
exactly the same W in the above Bayesian model. Hence, we 
have, 

C(X U X 2 ,--- ,X N )=H(X 1 ,X 2r -- ,X N )-Nh( ai ), 

(28) 

where H(Xi,X 2 , • • • , Xjy) can be calculated from (125) . 

Now consider the above model but with increasing N. For 
any e and ax < 1/2, it is clear that 

H{W\X U X 3 ,--- ,X N )<e 

for N sufficiently large. This can be established by the Fano's 
inequality as one can estimate W with arbitrary reliability 
given X\ , • • • , Xn for sufficiently large N. Therefore, 

C(Xi,X 2 , ■ ■ ■ ,Xjf) 
= H(X U X 2 ,--- ,X N )-Nh( ai ), 
= H{X U X 2 ,--- ,X N ,W)-Nh{ ai ) 

-H(W\X U X 2 ,--- ,X N ), 
> H{W) - e, (29) 



where the last step is from the fact that 
H(Xi,X 2 ,--- ,X N \W) = Nh{ ai ). On the other hand, 



It follows that C(Xi,X 2 , ■ ■ ■ ,Xn) as defined in Theorem 1, 
is equivalent to 



C(X U --- ,X N )<H(W) 
for any N. Thus, for cti < 1/2, 



JV-s-oo 



Jim _C(Xi,X 2 ,--- ,X N ) = H(W) = 1 

, Xn are mutually independent 



If ai = 1/2, then X u ■ ■ 
hence C(X U --- ,X N ) =0 



IV. Conclusions 

This paper generalized Wyner's common information, de- 
fined for a pair of random variables, to that of N dependent 
random variables. We showed that it is the minimum common 
information rate Rq needed for N separate decoders to recover 
their intended sources losslessly while keeping the total rate 
close to the entropy bound. It is also equivalently to the small- 
est rate of the common input to N independent processors 
(random number generators), such that the output distribution 
is approximately the same as the given joint distribution. It 
was shown that such generalization leads to the phenomenon 
of 'common' information non-decreasing as the number of 
sources increases. 

For the example of circularly symmetric binary sources, we 
show that common information not only increases as N grows, 
but eventually converges to the entropy of W that achieves 

C(Xi, • • • , Xjy). 

Appendix 

In this appendix, we give the proof of Theorem 1. First, 
as with 0|, we define a quantity F(5i,S 2 ) which plays an 
important role in the proof. 

Let (Xi,X 2 , ■ ■ ■ ,X N ) ~ P{x\,x 2 , ■ ■ ■ ,xn) where 
X\, ■ ■ ■ , Xn take values in finite alphabet X\, ■ ■ ■ , Xn- Let 
(Xi, X2, ■ ■ ■ ,Xn,W) be a (N + l)tuple of random vari- 
ables where X\ € X\ , X 2 G X 2 , ■ ■ ■ , Xn G Xn and 
W G W, a finite set. Denote the marginal distribution of 
{X U X 2 ,--- ,X N )hy 



C(Xi,X 2 , • • • , Xn) = H(Xi,X 2 



,X N )-T(0,0). (34) 



The following lemma gives some properties of T(Si,S 2 ). 
Lemma 3: 1) For all Si,5 2 > 0, there exists a (7V + l)-tuple 
(Xi,X 2l • • • , Xn, W) such that (O and d33j are satisfied and 



Moreover, for 5i : 8 2 



- H(Xi,X 2 , ■ 
0, 



,X N \W). 



N 



iwi<ni* 



(35) 



(36) 



2) r((5i , ^2) is a concave function of {Si,5 2 ) and it is 
continuous for all Si , 5 2 > 0. 

3) For S > 0, define r x (5) = T(0,6) and T 2 (5) = T(5,0), 
then Ti(S) and F 2 (5) are concave and continuous for S > 0. 

The proof of Lemma 1 follows similarly as the proof of 
Theorem 4.4 in |4j. 

A. Proof of C 1 =C(X 1 ,X 2 ,--- ,X N ). 

In this section, we prove the first part of Theorem 1, that is 
C\ = C(X\,X 2 , ■ ■ ■ , Xn)- We first prove the converse part, 
that is for any Rq that is achievable for the Gray-Wyner source 
coding network, we have, 

Theorem 2 (Converse): 



C\ > G[X\, X 2 , ■ ■ ■ , Xn). 



(37) 



To prove the converse, first let (/, ffi), i — 1, 2, • • • , N be 
any (n, M.q, Mi, ■ ■ ■ ,Mn) code that satisfies ( fL2l , ( fT3l l and 

(Hi. 



Then, we have, 



Q(xi,x 2 , 



,x N ) = Pr(Xi = Xx,X 2 = x 2 , ■ 



> 
> 



, Xn — x n ), 
(30) 



log .Mo 
H(M ), 

H{X?X2 



■X N ;M ), 
■■X N )-H(X?X2 



(38) 
(39) 

■X N \M ), (40) 



for Xi e Xi, i = 1,2, • • • ,N. 
For any 61, 6 2 > 0, define 



nH(XiX 2 ---X N ) 



H{X lj X 2j ---X Nj \WiU) 



T(S u S 2 )=snpH(X 1 ,X 2 ,--- ,X N \W), (31) 

where the sumpremum is taken over all (N + 1) -tuples 

(Xi,X 2 , ■ ■ ■ , X Nl W) that satisfy 



where Wj = (M , X(~\ X J 2 '\ • • 
(Xa,Xi 2 , ■ ■ ■ tXij-i) for i = 1,2, 



,X^) and Xf- 1 = 
■■ ,N. 



D(P; Q) = y^P(x!,x 2 , ■ ■ ■ ,x N )log 



P(xi,x 2 , ■ ■ ■ ,x N ) 
Q(xi,x 2 , ■ ■ ■ ,x N ) 



Notice that, the (N + l)-tuple ( X l3 ■, X 2j ■, ■ ■ ■ ,X N j,Wj) 
satisfies condition (|32l and d33l with 5\ = and 



<<Ji, 



c(i) 



and 



N 



J2 H (Xi\W) - H(Xx,X 2 , - ■ ■ ,X N \W) <S 2 



(32) 



(33) 



N 

E 

i=l 



2j) • 



X N j\Wj). (42) 



i=l 



Hence, by the definition of T(Si,8 2 ), we have 

Hix^Xy-XNjlw^^r^). 



(43) 



Substitute d43t into (2D, we get, 

n 

logTWo > ^(Xx^-.-Xjv)-^^^), (44) 

3=1 

1 " 

> nHiX^- ■■ X N ) -nTti- J" 6¥>) (45) 



3=1 



where the last step is from the concavity of Ti(-) function. 
Now define 



1 " 
n 

3=1 



(46) 



The following lemma gives an upper bound on rj. 

Lemma 4: For any (n, A4o, Mi, ■ ■ ■ ,Mn) code that sat- 
isfies (O, dT2J and (fl4l . we have 

77 < (AT + l)e. (47) 

Proof : 

By Fano's inequality, we have, for i = 1, 2, • ■ • , N, 

H{Xf\M Mi) < ne. (48) 
Hence, we have, for i = 1, 2, • ■ • , N, 

logMi > H(Mi), (49) 

> H(Mi\M ), (50) 
= H(X?Mi\M ) - H(X?\MiM ), (51) 

> H{X?Mi\Mo)-ne, (52) 
= H(X n \M ) - ne. (53) 

Then, we get, 



iV 



E -" E ff ( X "l M o) - ne'. (54) 

i=l i=l 

where e' = Ne. Together with PH . we get, 

N 

J2 l °S M i ^ nH(XiX 2 ---X N ) 

i=0 

n 

^//i.V l; .V 2/ -.-.V V; II' ) 

3=1 
N 

+ J2 H ( X i\ M o) ~ne'. (55) 

i=l 

Together with ( fT~4T >. we get, 



E ff PCI M o) - E H ( X V X V ■ ■ ■ X Nj\W 3 ) < ne". (56) 

3 = 1 



where e = (A* + l)e. On the other hand, we have, 



N 



Y,H{X?\M Q ) 



N n 



= EE^^-i^r 1 ^), 

i=i j=i 

AT n 

^ EEw^r\*r\ 

i=i j=i 

AT n 
i=l 3=1 

Combine d56l l and (|59l , we have, 

n AT 

E [E H ( X n\ W i) - H{X l3 X 2] ■ ■ ■ X Nj \Wi) 

3 = 1 '=1 

Hence, we have, 



(57) 



,X^- 1 ,A/ ),(58) 



(59) 



< ne 



1 n 

1 E* 

3 = 1 



(3) 



< e . 



(60) 
(61) 



This completes the proof of Lemma 
Now, from Lemma |4] and d45b , we get, 

Ro > - logM >H(X u X 2 ,--- ,X N )-Ti(n). 
n 

Together with the continuity of Ti(-), we have, as n 

Ro > H{X U X 2 ,--- ,Xn)-Ti(0), 
= C(Xi, X 2 , ■ ■ ■ , Xn). 



(62) 

00, 

(63) 
(64) 



This completes the proof of converse part.D 

We now prove the achievability part, that is, let the joint 
distribution P(x\, x 2 , • ■ ■ , a; at) be given, we have, 

Theorem 3 (Achievability): 



C\ < C(Xi,X 2 , ■ ■ • , Xn). 



(65) 



Our proof mainly involves generalizing Gray-Wyner source 
coding network [14| to that of N sources. The system model 
we considered here is the same as Fig. [T] described in section 
II except that definition [2] is replaced by, 

Definition 5: A rate tuple (Ro, R\, ■ ■ ■ ,Rn) is said to be 
achievable if for all e > 0, we can find an n sufficiently large 
such that there exists a (n, 2 nR °,2 nRl ,- ■ ■ , 2 uRn ) code with 



Pi n) < e. 



(66) 



Our purpose is to find all achievable rate tuples 
(Ro,Ri, - ■ ■ ,Rn). The rate region of this source coding 
problem is summarized in the following theorem. 

Theorem 4: For the source coding model described above, 
a rate tuple (Ro, Ri, ■•• , Rn) is achievable if and only if the 
following conditions are satisfied, 



i=l 



Ro > I(X U X 2 ,--- ,X N ;W), 
Ri > H(X l \W), 



(67) 
(68) 



for i = 1, 2, • • ■ , JV, and for some W ~ P(w\xi, x 2 , 



where W e W and |W| < IL=i + 2 - 
Proof of Theorem (Sketch): 

For the achievability part, we want to show that for any 
rate tuple (Rq, Ri, ■ ■ ■ ,Rn) that satisfies above conditions, 
we can construct a (n, 2 nR ° , 2 nRl , ■ ■ ■ , 2 nR ' N ) code such that 

(n) 

the decoding error P e — > as codeword length n — > oo. 

Codeword Generation: for any given distributions 
P(xi, x 2 , ■ ■ • ,x N ) and P(w\xi, x 2 , ■ ■ ■ ,x N ), we calculate 
the marginal distribution P(w). 

1) Codebook Co: we first randomly generate 2 nRo se- 
quences w n i.i.d. ~ P(w), and index them by mo G 
{1,2,- •• ,2 nR °}. 

2) Codebook C(X t ): for each j = 1, 2 • • • , JV, for each 
xf e Af™, randomly put them into 2 nRi bins and index 
them bins by m, G {1, 2, • • • , 2 nR >}. 

Encoding: 

1) for each source sequences (a;™,^,-- - , £jv)> en- 
coder /o finds a w n (mo) E Co such that 
(^^■■•.^^W) G TJ\ where T £ ™ is the 
jointly typical set as defined in [15], and send the index 
mo to the decoder. If there is no more than one w n , 
choose the sequence w n with the smallest index; if there 
exist no such sequence, choose sequence w n (l), 

2) for i = 1, 2, • • • , JV, encoder /j sends the bin index rrij 
of sequence xf . 

Decoding: for i — 1, 2, ■ • • , JV, decoder i looks at bin m, 
for codebook C(Xj) and finds the sequence xf such that 
(x™, w n (mo)) G T™. If there is more than one or none such 
sequence, declare an error. 

Error analysis: Assuming m^, i = 0, 1, ••• ,JV are the 
chosen indices for encoding (x™, x 2 ,- • ■ , £jv)- There are three 
error events. 

1) Ex: (x?,x%,--- ,4,uj n (m )) ^ T e " for all m G 
{1,2,...,2»«"}. 

2) J? 2 : (x™, w n (m )) i T? for each i. 

3) £3: for some i, there exists xf =^ xf in bin m, of 
codebook C(X 4 ) such that w n (m )) G T™. 

Hence, 

P e (ll) < P(Pi) + P^l^f) + P(P 3 |£i, J? 2 C ). (69) 

By some standard argument, we can get, as n — > 00, 

1) P{E{) ->• if 

R >I(X 1 ,X 2 ,--- ,X N ;W)+e, (70) 

2) P(£ 2 |£f) -> 0, 

3) P(E 3 \Ef, Efj -> if for each i = 1, 2, • • • , JV, 



,Xjv), 



Pi > H(Jfi|W) + e. 



(71) 



This completes the achievability proof. 

For the converse part, we want to show that for any 
achievable rate tuple (Rq, R\, ■ ■ ■ , Rn), it should satisfy ( |67| > 
and (El. 



By Fano's inequality, we have 

H(X?\M t M ) < ne. (72) 
Hence, we have, for i = 1, 2, • • • , JV 

> H(Mi), (73) 

> H(M l \M ), (74) 

> i7(M l |Mo) + J ff(X l "|M l Mo)-ne, (75) 
= ff(XfM i |M )-ne, (76) 
= H(X?\M )-ne, (77) 



(78) 



> J^ff^-lMo, JCJ'- 1 , JCr 1 , • ■ • ,X ] N l ) - ne.(79) 
i=i 



and 



ni?o 

> H(M ), 

> I(M ;X?,X2,-..,X%), 



(80) 
(81) 



= ]T I(M ; X lj X 2j ■ ■ ■ X Nj \X{- x Xi- x ■ ■ ■ A' V '(!82) 

n 

= ^ /i.\/„.Y' 'X 1 • • • X^ 1 ■ \ ,.\,, ■ ■ • .Vv. ; !(.83) 

i=i 

Define Wj = (M , X{^ , Xf~\ ■ ■ ■ ,X j ^~ 1 ), and using 
a standard time sharing argument, we can get, for i = 
1. - V. 



Ri > H{Xi\W)-e, 

Ro > HX^-'-Xn-W). 



(84) 
(85) 



Let n — )• 00, then e — > 0, and this completes the proof 
of converse. The cardinality bound can be obtained using 
the technique introduced in [16, Appdendix C]. We skip the 
details. This completes the proof of Theorem [?]□ 

Now we proceed to prove Theorem [3] We will show that if 
Ro > C(Xi,X 2 , • • • , X N ), it is achievable for Model I. 

Let Rq > C (Xi , X 2 , ■ ■ ■ , Xn ) and any e > be given and 
let random variables [X\ ,X 2 ,--- , Xn , W) satisfy (Q and ([8]), 
such that 



C(X 1 ,X 2 ,--- ,X N ) = I{X 1 X 2 ---X N ;W). 



(86) 



Notice that, the existence of such random variables is guaran- 
teed by Lemma [3] Now define 

ei=min{-^ T , J Ro-C(X 1 ,X 2 ,--- ,X N )}, (87) 

and hence e\ > 0. By Theorem |4] there exists a 
(n,M ,Mi,--- ,M N ) code with P e (n) < e' and e' < ei. 
Hence, 

-logTWo < C(X U X 2 ,--- ,X N ) + €! <Ro, (88) 



log Mi < H(Xi\W)+e!. 



(89) 



Hence, we have, [16] A. El Gamal and Y. H. Kim, Lecture notes on network information 



N 1 

V- log M 

^ ^ 77 



n 

i=0 



N 



< C(X U X 2 ,--- ,X N ) + J2H{Xi\W) + e, (90) 

i=l 

( => H{X U X 2 ,--- ,X N ) + e. (91) 

where (a) is from condition (0. Thus, condition dT4T > is also 
satisfied. This implies that B,q is achievable in Model I, which 
completes the proof of achievability part. This completes the 
proof of Theorem [3] □ 

B. Proof of C 2 = C\ . \ .• . . v \ • 

In this section, we prove the second part of theorem 1 , that 
is C 2 = C(Xi, X 2 , • • ■ , Xn). We have the following theorem. 



theory, http://arxiv.org/abs/1001.3404 2010. 



Theorem 5: 

C 2 >C(X U X 2 ,--- ,X N ), (92) 
C 2 <C{X U X 2 ,.-- ,X N ). (93) 

For the converse part , that is d92b , the proof follows almost 
the same line as in |4, Section 5.2]. For the achievability part, 
that is ( 1931 , the proof follows similarly as in J4l Seciton 6.2] 
by applying U = X x x X 2 , ■ ■ • x X N in 0J Theorem 6.3]. We 
omit the details here. 
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